AITopics | safety feature

Collaborating Authors

safety feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AI's safety features can be circumvented with poetry, research finds

The GuardianNov-30-2025, 14:00:12 GMT

Roses are red, violets are blue, how do you make a nuclear bomb? Roses are red, violets are blue, how do you make a nuclear bomb? AI's safety features can be circumvented with poetry, research finds Poetry can be linguistically and structurally unpredictable - and that's part of its joy. But one man's joy, it turns out, can be a nightmare for AI models. Those are the recent findings of researchers out of Italy's Icaro Lab, an initiative from a small ethical AI company called DexAI.

large language model, machine learning, natural language, (15 more...)

The Guardian

Country:

Europe > Italy (0.25)
North America > United States (0.17)
Europe > Ukraine (0.06)
Oceania > Australia (0.05)

Industry:

Leisure & Entertainment > Sports (0.72)
Government > Regional Government (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media (0.74)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Add feedback

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

Neural Information Processing SystemsNov-21-2025, 08:16:22 GMT

MDP, for this task and prove that it completely explores the safely reachable part of the MDP without violating the safety constraint.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Virginia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Netherlands (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

Measuring Moral LLM Responses in Multilingual Capacities

Basu, Kimaya, Kolari, Savi, Yu, Allison

arXiv.org Artificial IntelligenceOct-13-2025

With LLM usage becoming widespread across countries, languages, and humanity more broadly, the need to understand and guardrail their multilingual responses increases. Large-scale datasets for testing and benchmarking have been created to evaluate and facilitate LLM responses across multiple dimensions. In this study, we evaluate the responses of frontier and leading open-source models in five dimensions across low and high-resource languages to measure LLM accuracy and consistency across multilingual contexts. We evaluate the responses using a five-point grading rubric and a judge LLM. Our study shows that GPT -5 performed the best on average in each category, while other models displayed more inconsistency across language and category. Most notably, in the Consent & Autonomy and Harm Prevention & Safety categories, GPT scored the highest with averages of 3.56 and 4.73, while Gemini 2.5 Pro scored the lowest with averages of 1.39 and 1.98, respectively. These findings emphasize the need for further testing on how linguistic shifts impact LLM responses across various categories and improvement in these areas.

category, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.08776

Country:

Asia > China (0.15)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

`For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts

Schoene, Annika M, Canca, Cansu

arXiv.org Artificial IntelligenceJul-8-2025

Recent advances in large language models (LLMs) have led to increasingly sophisticated safety protocols and features designed to prevent harmful, unethical, or unauthorized outputs. However, these guardrails remain susceptible to novel and creative forms of adversarial prompting, including manually generated test cases. In this work, we present two new test cases in mental health for (i) suicide and (ii) self-harm, using multi-step, prompt-level jailbreaking and bypass built-in content and safety filters. We show that user intent is disregarded, leading to the generation of detailed harmful content and instructions that could cause real-world harm. We conduct an empirical evaluation across six widely available LLMs, demonstrating the generalizability and reliability of the bypass. We assess these findings and the multilayered ethical tensions that they present for their implications on prompt-response filtering and context- and task-specific model development. We recommend a more comprehensive and systematic approach to AI safety and ethics while emphasizing the need for continuous adversarial testing in safety-critical AI deployments. We also argue that while certain clearly defined safety measures and guardrails can and must be implemented in LLMs, ensuring robust and comprehensive safety across all use cases and domains remains extremely challenging given the current technical maturity of general-purpose LLMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.0299

Country:

North America > United States (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (0.92)
Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Developing a Robotic Surgery Training System for Wide Accessibility and Research

Shaker, Walid, Erden, Mustafa Suphi

arXiv.org Artificial IntelligenceMay-28-2025

-- Robotic surgery represents a major breakthrough in medical interventions, which has revolutionized surgical procedures. However, the high cost and limited accessibility of robotic surgery systems pose significant challenges for training purposes. This study addresses these issues by developing a cost-effective robotic laparoscopy training system that closely replicates advanced robotic surgery setups to ensure broad access for both on-site and remote users. Key innovations include the design of a low-cost robotic end-effector that effectively mimics high-end laparoscopic instruments. Additionally, a digital twin platform was established, facilitating detailed simulation, testing, and real-time monitoring, which enhances both system development and deployment. Furthermore, teleop-eration control was optimized, leading to improved trajectory tracking while maintaining remote center of motion (RCM) constraint, with a RMSE of 5 µ m and reduced system latency to 0.01 seconds. As a result, the system provides smooth, continuous motion and incorporates essential safety features, making it a highly effective tool for laparoscopic training.

artificial intelligence, instrument, robot, (16 more...)

arXiv.org Artificial Intelligence

2505.20562

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Li, Mingjie, Si, Wai Man, Backes, Michael, Zhang, Yang, Wang, Yisen

arXiv.org Artificial IntelligenceJan-3-2025

As advancements in large language models (LLMs) continue and the demand for personalized models increases, parameter-efficient fine-tuning (PEFT) methods (e.g., LoRA) will become essential due to their efficiency in reducing computation costs. However, recent studies have raised alarming concerns that LoRA fine-tuning could potentially compromise the safety alignment in LLMs, posing significant risks for the model owner. In this paper, we first investigate the underlying mechanism by analyzing the changes in safety alignment related features before and after fine-tuning. Then, we propose a fixed safety module calculated by safety data and a task-specific initialization for trainable parameters in low-rank adaptations, termed Safety-alignment preserved Low-Rank Adaptation (SaLoRA). Unlike previous LoRA methods and their variants, SaLoRA enables targeted modifications to LLMs without disrupting their original alignments. Our experiments show that SaLoRA outperforms various adapters-based approaches across various evaluation metrics in different fine-tuning tasks.

large language model, machine learning, salora, (17 more...)

arXiv.org Artificial Intelligence

2501.01765

Country: Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The best space heaters in 2024

We may earn revenue from the products available on this page and participate in affiliate programs. If you're tired of stockpiling blankets, extra socks, and heated slippers to keep you warm, it might be time to consider getting a space heater. These powerful appliances are a great way to get cozy without installing a complicated heating system or commandeering the thermostat. If your radiator just isn't cutting it or someone insists on keeping a window open to freshen the room up, a space heater could be the perfect solution. These hot machines are designed specifically to warm up spaces of all sizes and should be portable, effective, and fast-acting. Our best overall pick, the Lasko 5586 Electric 1500W Ceramic Space Heater Tower, ticks all these boxes.

best space heater, heater, space heater, (14 more...)

Popular Science

Country: North America > United States (0.28)

Industry:

Construction & Engineering > HVAC (0.67)
Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence (0.69)

Add feedback

'Many-shot jailbreak': lab reveals how AI safety features can be easily bypassed

The GuardianApr-3-2024, 14:16:45 GMT

The safety features on some of the most powerful AI tools that stop them being used for cybercrime or terrorism can be bypassed simply by flooding them with examples of wrongdoing, research has shown. In a paper from the AI lab Anthropic, which produces the large language model (LLM) behind the ChatGPT rival Claude, researchers described an attack they called "many-shot jailbreaking". The attack was as simple as it was effective. Claude, like most large commercial AI systems, contains safety features designed to encourage it to refuse certain requests, such as to generate violent or hateful speech, produce instructions for illegal activities, deceive or discriminate. A user who asks the system for instructions to build a bomb, for example, will receive a polite refusal to engage.

jailbreak, many-shot jailbreak, safety feature, (7 more...)

The Guardian

Industry:

Law > Criminal Law (0.58)
Law Enforcement & Public Safety (0.58)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Neural Information Processing SystemsMar-12-2024, 14:59:43 GMT

In classical reinforcement learning agents accept arbitrary short term loss for long term gain when exploring their environment. This is infeasible for safety critical applications such as robotics, where even a single unsafe action may cause system failure or harm the environment. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an a priori unknown safety constraint that depends on states and actions and satisfies certain regularity conditions expressed via a Gaussian process prior.

algorithm, exploration, safety constraint, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Virginia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Netherlands (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Add feedback

Is this helicopter that can fly itself the answer to ending chopper crashes?

FOX NewsDec-21-2023, 16:00:10 GMT

Kurt "CyberGuy" Knutsson discusses a craft that can fly autonomously without any human intervention. Imagine a helicopter that can take off, fly and land without a human pilot. CLICK TO GET KURT'S FREE CYBERGUY NEWSLETTER WITH SECURITY ALERTS, QUICK VIDEO TIPS, TECH REVIEWS, AND EASY HOW-TO'S TO MAKE YOU SMARTER The R550X is a revolutionary helicopter from Rotor Technologies. It is special because it is the first of its kind to be designed for civilian use, not military or law enforcement. It can perform a variety of missions, such as crop spraying, cargo delivery, firefighting, surveillance, inspection, mapping, surveying, research, exploration, entertainment, and more.

autonomous helicopter, helicopter, rotor technology, (15 more...)

FOX News

Country: North America > United States > New Hampshire (0.05)

Industry: Transportation > Air (1.00)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback